So you're a person who appreciates things for flavor. Do you like pi? Was your pgp encryption made in an industrial sweatshop and devoid of that secret ingredient, love? Want a personalized encryption tool for totally awesome secret base activities? Well– here you go. I baked it myself.
This came mostly from a sort of joke I thought of one evening as a response to random pondering of the usual problem with simple ciphers. A simple cipher is a character changing method where one character is substituted for another and this rule is followed for all instances of that character. For example, in the Caesar Cipher, all letters are simply shifted ahead or backwards x number of spaces. Using x = 1, A becomes B, B becomes C, and so on. This obscures normal legibility but leaves a relatively transparent contour of the encoded information plainly legible for anybody with time and desire to crack it, which makes for a pretty bad encryption. Taking one step further from just shifting to substitution, we yield another step away from legibility but there's still issues of blank space, word length, single letter words like I and a, and other such simple attacks to yield legible information, further aiding reading simple ciphers without the cipher key as the shoreline of obscurity reduces.
One day I had done some minor experimentation with python, ciphers, and text analysis and found that such subtle flaws occur sometimes non-intuitively and that anyone who messes with ciphers can develop an odd amount of insight into sequences of any cipher they are familiar with somewhat intuitively and start to see patterns. This is especially apparent in smaller samples with linear patterns, where the topography becomes apparent.
Notoriously, such an observation was used by Polish mathematician Marian Rejewski to exploit a procedural flaw by German operators in the Enigma Code who would use repetitions in communicating indicator settings for the machines which encoded the messages.
German operators would use a six-character key composed of a three-character key repeated twice with the first encoding the second. For example, if the key was ABCABC, the first ABC would encode the second ABC to, using the Caesar cipher for example, BCD; rendering ABCBCD. This key would then be input once more to finally prime the machine to a condition ready to send the thusly encrypted message. This procedure was in theory to reduce garbles, but introduced a fatal vulnerability. Employing a mathematical theorem which dictates that two sets are conjugate if they have the same cycle structure, Rejewski noticed that the two sets of three in the six characters were conjugate in multiple phases, yielding a formula for the relationships between the 1st and 4th, 2nd and 5rd, and 4th and 6th characters.
Noticing such analogous symmetries in my own experiments, I was compelled to think about how I could encode noise into a cipher so that such patterns vanish. Pondering this over tea at night I thought, “It would be funny to use pi to cipher letters." The next day I made this.
In this algorithm we will use ASCII switching and re-encode each digit-combo for a character into itself + a digit of pi. This, not being fancy and writing something that has larger possible variance, yields us an end state where any character can be encoded as up to ten different characters (+0-9, itself if +0 and 9 other characters).
Using it
does require you to muster the intellectual curiosity to apply to the question, 'what is replit' (or even better, 'what is Pycharm'), opening a project, copy/pasting, hitting a play button, and following the prompts which appear.
However, it
does not strictly speaking require you to know python or code, you just need to not be intimidated by 80 very legible in human terms lines of code. If you want to experiment and can summon the courage, you can make edits to the backend code (
if you can imagine changing your name but still being yourself and responding to your new name, you can refactor code) and change how it works or write your own key generation algorithms following the examples provided.
First you will need an environment to run your code in, I suggest using Replit, which is an in browser virtual-environment which is great for beginners:
Step 1: Go to
replit.com
Step 2: Make an account, after this go back to replit's main page
Step 3: In the center of the screen is a little blue button with a cross that says 'Create Repl', click it and select python #FIXME MORE DETAIL
Step 4: Copy paste the code in key lime pi #FIXME ATTACH FILE ETC
Step 4: either import or create a new .txt and copy paste the contents of the following text document named key3.txt
Step 5: Hit the green play button, and then follow the prompts.
The mode it is in default is Key Type 1, AKA Pi Mode. I encourage you to try it out first, then I will explain how this works.
Before taking input from the user, the computer generates a number key comprised of digits of pi. In math terms, this is pretty simple, but in computer terms, it wound up being slightly more complex than my initial assumption.
Python has a built-in pi function in the math module, but math.pi only renders 16 digits of pi (3.141592653589793). The problem with this is that our key needs to be long enough to encrypt phrases of any reasonable length you might anticipate. Having a sixteen-digit key limits us to encrypting very few characters since the method we will be using (ASCII switching) will use one of these digits per character we encode. Thusly, using math.pi would limit us to 16 characters, which is too short for most purposes. Exceeding this limit would crash the script without any additional statements either looping the key or writing exceptions for handling such errors and demanding a valid input.
The second problem with working with pi is that not every algorithm for generating it is the same. Many algorithms only approximate pi and break down at the scales I wanted. Seeking to overcome this barrier, I read documentation until I found that the mpmath module uses the
Chudnovsky algorithm to approximate an infinite sequence of equations which render 14 digits per term. As of the 21st of March, 2022, the Chudnovsky algorithm is the algorithm which is the current world record holder for most digits of pi ever calculated by Emma Haruka Iwao. It took 128 vCPU and 864GB of RAM to simulate, 633 TB of memory to store, and 158 days to calculate 100 trillion digits. The
Chudnovsky algorithm also claims the last ten world records for pi.
Fun fact: In 1949 the world record for most pi digits calculated was broken by John Wrench and Levi Smith with 1,120 digits calculated using a desk calculator. It would have taken them forever to perform this algorithm!
Also its worth nothing one should
check out Ramanujan, who discovered the pi formula π=2√3∞∑n=0(−1)n(2n+1)3n that the Chudnowsky brothers built their algorithm on. He is an interesting figure who had almost no formal training in mathematics but was nonetheless a prodigy who contributed thousands of functions, equations, and theorems to mathematics. His life was very interesting and well worth reading into.
Much more importantly, according to the author of the code, Fredrik Johannson, this algorithm allows us to set a desired accuracy length. So in my code, I used a resolution of 1050 to grab at least 1k digits of pi. If you want to read more in depth on how this works,
you can here.
Now that we can generate a thousand digits of pi, we need to store them– so we do this with a variable, just like middle school algebra. A variable is simply a name and some information attached to that name. In this case the variable 'un_scrubbed_pi' is assigned to a string of digits of pi. There is however, a problem. This string still includes the decimal point In '3.14'. Since periods are not numbers and can't operate in math, we need to discard the decimal point. This is done by borrowing a function from a module named 're'. Using the method re.sub(), we can set a character whitelist that only allows through numbers and replaces other characters found with "" (no space).
Finally, having pre-loaded a string of digit-characters, we can begin with the rest of the script. Note: The function which performs this step has different modes will always make a key by the same name, even when we don't use pi (other key types will be explained later). This way, many types of key generation can be employed without changing the encryption function.
Easy as 1, 2, 3
- The computer takes an input from the user, for example "ABC" and stores it
- It then iterates down the characters in the input, converting them into ASCII values, which are numbers that represent characters. So A, B, and C become 65, 66, and 67 respectively.
- Character by character the computer takes the converted ASCII value, references pi, and adds a digit of pi to the ASCII value of the letter. So if you input 'ABC', the computer checks A, converts it to the number 65, adds 3, the first digit of pi, then re-encodes this back from the new ASCII value to the character equal to 68 (D). It then checks B, converts it to 66, adds 1, and renders out 'C'. This process repeats with C and it outputs 'G'. Because we are adding a different number each time the output 'AAA' yields 'DBE,'' leaving a noisy and illegible result. Pretty simple in the grand scheme of things, but simplicity is always a great starting place.
Some sample inputs and their corresponding outputs:
Hello == Kfpmt
I love Python == L!pp{n"V~wmww
The quick brown fox jumps over the lazy dog == Wii!v~kip#gzx~w#hr"$pwstv#wygy)yhg(te{$'eup#
The next mode does something similar except it uses text as the key generation mechanism instead of pi. It functions similar to the method for converting letters in the user input string into their ASCII values.
The second mode takes a text sample and converts it letter by letter to ASCII, then stores these values in a variable by appending them onto the end every single iteration. It then uses this in lieu of the pi key to iterate down, adding digits from this key to the user's input phrase, and then re-rendering out the resultant characters.
In the first iteration I just simply used a string, which in layman's terms are characters in-between apostrophes or quotes in code. To be slightly more complex, strings are both sets of characters and also a 'data type'. In python there are many data types and they all handle logic differently.
That's a lesson for another day but the simplest thing to know is that in Python, '1' is a string, 1 is an integer, and 1.0 is a float number. Asking a computer to print '1+1' (parrot a string) will get you '1+1' while asking a computer to print 1+1 (add two integers) will get you 2. Beyond that, you'll really have to read more about data types if you would like to have a better understanding of this concept. It's all basically like
The Treachery of Images by René Magritte but for computers and logic. Ceci n'est pas un pipe, indeed.
As I was saying, in the first iteration I used a hard coded string to test, but similar to the math.pi issue the fatal error with this is that it's just a short phrase and translating anything too long yielded an indexing overflow error (went too far and crashed). For this to be a proper tool and not just a toy proof of concept, it needs to have power. So, in the second iteration, I wrote a key generation mode which reads a text document, strips it of characters, and converts these into a giant ASCII chain which it then uses for the key.
The first iteration (or rather, zero-th) had an even more grievous issue: I wrote it in a transparently empirical manner and so the whole encryption script and key script were one unit, which makes for a terrible lock when your key is taped to it. Next I rewrote it functionally, so that you can split up the key function and the encrypt/decrypt function and they can worth independently with anything that either uses the generated keys or returns a correctly named key with the correct properties to do math with. Taking into account the textual key that was added in the second iteration of the functional version, this makes for three packets which can be split in addition to the material encrypted with it.
Some fun notes from the design process:
Turns out there was a lot of invisible issues that arose when I had someone else user test and I discovered some fatal errors that I had to re-design around.
Issues: There was one loop flow that accidentally would trap it in decode mode under certain conditions, I didn't notice because I probably would reflexively restart the program while working on it. I thought I checked for this but must have missed it. This was easily corrected in two seconds by re-constructing my loop logic.
Much worse problem: I asked the user to send me something they encoded and when I decoded it, it was gibberish. I asked them to repeat this process, gibberish again. I started thinking about what a user could do that could make this happen, so I tried adding a blank space, modulating up instead of down, modulating twice in either direction, and so on. I couldn't find anything.
Even more confusingly, I didn't get the same output in Replit and in Pycharm. It was the first time I ever had code work in one environment not the other so this was mind boggling at first, but this was a clue.
Inspired by the example of John Couch Adams, who used Kepler's Laws of Planetary Motion to mathematically predict Neptune based on a wobble in the orbit of Uranus he perceived in a data set of planetary observations; I set out to find the issue using science.
I started by examining the two outputs and noting the differences. Right away, I noticed that things rendered correctly up until a certain point, and that this point changed in different test phrases. A clue, evidence of a threshold. I then simplified the signal chain (we were exchanging messages via Facebook messenger, which can add artifacts to data, so I eliminated this stage) to eliminate the possibility of artifacts being introduced to the encoded phrase (common with blank space for example) that might throw off the decryption process. The issue still persisted. So I pursued the threshold clue.
In Replit and PyCharm, I rendered a matrix of characters according to their ASCII values in both environments using a for loop and the method chr() which takes a number argument and returns it's ASCII character.
This sounding quickly confirmed my suspicion, and it turns out Replit uses a reduced UTF-8 character set which only has the range 32-126, then a gulf on either end extending the range of we will occupy with +/-9 being the extremes of our operations (and generally we are encoding + then decoding -). PyCharm in comparison renders 0-128, with a few blanks in the range of 8-10, position 13, and position 32. Ironically, it was the error in the loop logic that stuck it on decode which made it instantly visible because it was rendering country codes that I hadn't seen before and which replit didn't have, which prompted the sounding in the first place.
This basically shredded information as soon as something near the upper limit crossed that (xyz, as upper case letters go first, then lower, in ASCII value) so I had to rewrite it with a mod loop to have a stable web version
It also had other small issues like the command phrases for encode and decode and quit would also print out their translation, which was not elegant looking, so I cleaned this up.
I get why, to save data, but it was funny because now I have a forked version of this, one online and one off. The online version feels weaker although the noise is compressed so I need to analyze what these signals look like. The offline version one is hilarious in that many attempts to migrate the data by anything less than a specialist will mangle it. Note: Since writing this, I have continued to develop the forked version and abandoned the other one for the lack of broad spectrum compatability.
Weirdly it means it's a radar-gun for detecting if front end character set is in range 33-126 and sometimes back end information of in browser IDES/ poetry environments.
In the future, I will re-write the decode function so that it wheels with modulus similar to the encrypt function. At this point I probably need to separate character reference into its own function and rewrite all of the debug print statements so that they are easier to interact with and less granular.
That's pretty much it.
Easy as pi.
Note: Automatic download may not be supported, if so: click the link > right click > save page as